home *** CD-ROM | disk | FTP | other *** search
- This is a rough draft - Megan 04/20/92
-
- Audio/Video Transport WG Meeting Report
- 17-Mar-92, San Diego
-
- 1. Introduction: Goals, Scope of this working group
-
- The AVT WG met for three sessions on Tuesday in San Diego. Audio from
- the presentations and discussions at these sessions was "audiocast"
- via UDP and IP multicast to participants at a number of locations
- ranging from Australia to the UK, and the remote participants were
- able to ask questions over the return path.
-
- The purpose of this working group is to specify one or more
- experimental protocols to foster interoperation among multiple packet
- audio/video implementations in experiments such as this audiocast.
- The focus of the WG is short-term (see the charter). Our first goal
- is to have the protocols defined and experimental implementations
- running in time for use in a second audiocast at the July, 1992 IETF
- meeting. Therefore, in this meeting we dove right in to a discussion
- of what the protocol should look like.
-
-
- 2. Data packet header formats for real-time audio and video
-
- We need a "transport" protocol for real-time, continuous media. That
- means we don't want the retransmission and flow control of TCP, but we
- do want sequencing and checksumming. We could define a new protocol
- to fit directly over IP, but in keeping with the short-term scope of
- this working group, we choose to fit a new protocol over IP+UDP so it
- can be deployed quickly. Alternatively, another protocol that
- provides the necessary functions, such as ST-II, can be used. Those
- functions are port addressing, length, and (optional) checksumming.
-
- The missing function is sequencing. Steve Casner described the data
- packet format of the Network Voice Protocol (NVP-II) which was serving
- this function for the audiocast of this meeting. The header is
- efficient (only 4 octets), but that makes some of the fields too small
- to support current requirements. To begin discussion of a
- replacement, the following strawman protocol with only two fields was
- proposed:
-
- o 32-bit Timestamp (16 bits of seconds + 16-bit fraction)
- o Sequence Number (could be less than 32 bits)
-
- There was substantial discussion of the nature of the timestamp. It
- must have sufficient range to cover any network delay (segment
- lifetime) that might be expected, and it must have sufficient
- resolution to allow the desired degree of superposition and
- coordination among media streams. The bit allocation shown has a
- range of 18 hours and a resolution of 16 microseconds. The timestamp
- could be synchronous with the media sampling clock, in which case it
- would tick at the nominal sampling rate and drift with respect to real
- time, or it could be synchronous with real time. In the latter case,
- the timestamp could represent absolute real time if it were defined to
- be the middle 32 bits of a Network Time Protocol (NTP) timestamp, or
- it could be merely relative to real time.
-
- For purposes of synchronization among multiple media sources,
- real-time timestamps should be used, though they need not be absolute.
- Julio Escobar from BBN gave a presentation on the Synchronization
- Protocol. It is based on globally synchronized clocks (e.g., using
- NTP) and defines a set of control protocol exchanges to establish an
- equalization delay for synchronized playback. It's only requirement
- on the data packet format is that a real-time-synchronous, relative
- timestamp be carried.
-
- Although the timestamp field can be used to sequence the packets, it
- cannot be used to detect lost packets for media such as voice that
- suppress transmission when there is no activity. The sequence number
- serves that function. It could be smaller than 32 bits because the
- timestamp disambiguates wrap-around within the maximum segment
- lifetime. The number of bits should be large enough that the loss of
- exactly one sequence space of packets is a rare-enough event that
- failure to detect it is acceptable.
-
- Steve Deering proposed some additional fields/functions to be included
- in the data packet header:
-
- o Checksum (to validate decryption)
- o Version Number
- o Encoding Type
-
- The UDP checksum cannot be used to validate decryption because it must
- be applied after encryption, so a separate checksum would be required.
- An alternative that would not require an additional field but would
- require more complex processing is to use the successful decryption of
- several properly sequenced packets as the validation of the key. On
- the other hand, including a checksum at this level, covering either
- just the header or header plus data, would also be useful with the
- ST-II protocol that does not checksum higher-layer protocols.
-
- A version number would allow implementations to distinguish among
- multiple versions of the protocol.
-
- The encoding type field might be used for several purposes. It could
- identify the particular compression algorithm used so that the
- receiver could select the correct decompression. However, if that
- selection would be constant over the life of the session, it could be
- communicated in an out-of-band control protocol.
-
- If multiple media are sent are sent on one port number, then an
- additional level of demultiplexing would be needed and the encoding
- field could serve that purpose. For layered (embedded) coding
- schemes, a field is needed to identify the separate layers, but this
- field might be here or might be consigned to the application-layer
- protocol. For the network to process the separate layers at different
- priorities, it is expected that some priority field would be needed in
- the network layer.
-
- Finally, two fields from other packet audio protocols were considered:
-
- o Energy Level (from Xerox PARC Phoenixphone)
- o Cumulative Delay (from CCITT G.764)
-
- For audio packets, the energy level is an indication of the sound
- volume in the packet. This may be useful to the receiver when mixing
- audio streams, for example. It could be recalculated by the receiver
- rather than being carried in the packet.
-
- The CCITT recommendation G.764 Packetized Voice Protocol includes a
- field that records the cumulative variable queueing delays experienced
- by a packet in traversing the network. This may be useful for
- deadline-scheduling of packet forwarding, but it was decided that
- those experimenting with such algorithms would need to add the field
- in some lower layer.
-
-
- 3. Field inclusion criteria
-
- We did not attempt to decide "in real time" what fields/functions
- should be included or excluded. Further discussion is expected via
- email. Instead, we established some criteria for inclusion of these
- and other fields in a real-time transport protocol:
-
- - What percentage of applications would require the field? If
- only a small percentage, the field should be left to the
- application layer.
-
- - What application functions we are trying to support with these
- fields? We may be able to combine functions by choosing the
- fields right.
-
- - How should we tradeoff network bandwidth vs. processing and
- complexity of control algorithms? (The discussions of the
- checksum and energy fields are examples.)
-
- - Would the field be constant in all packets at a given
- demultiplexing level? If so, that information could be implicit
- and carried in an out-of-band control protocol. Or is there a
- need for the data to be self-describing?
-
- - Does the field/function "belong" at this level? Considerations
- include overlap with other layers, aesthetics, common practice
- and understanding.
-
-
- 4. Addressing
-
- In the third session we discussed how addressing (multiplexing) should
- be divided among the layers. Steve Deering explained:
-
- - The IP multicast address should identify a particular session or
- set of recipients. Two different sets of recipients should have
- two different addresses.
-
- - The destination port address must be the same for all recipients
- if the packets are to be multicast, so the destination port must
- be administratively, not dynamically, assigned. Since the space
- space of well-known port numbers is small, we can't assign
- separate ports for each kind of data in a multimedia session.
- It may be appropriate to have a control port and a data port, or
- perhaps to distinguish major data types, such as audio and
- video. Source port numbers are dynamically assigned and can
- distinguish multiple participants at one IP address.
-
- - If there are multiple flows (e.g., audio and video) to one
- multicast address, it may be necessary to include another level
- of demultiplexing in the audio/video transport layer. This
- relates to the "encoding" field mentioned earlier.
-
- Further discussion is needed to decide how much multiplexing should
- occur at each layer. There are considerations both of address space
- and of implementation (whether it is better to read multiple media on
- one socket or separate sockets, for example).
-
-
- 5. Linkages between data and control
-
- Flexible management of multimedia connections or sessions is the
- subject of current research and beyond the short-term scope of this
- working group. For simple application modes, such as an audiocast on
- an advertized "channel" (e.g., IP multicast address), operation is
- possible with no control protocol at all.
-
- For spontaneous communication, there is pool of 2^16 IP multicast
- addresses from which an address may be chosen, but then that address
- must be communicated to the participants. This group may define a
- simple interim protocol for this purpose as a second step (after the
- transport protocol). Some inputs to this process would be the
- "session protocol" used by the vat program, the Connection Control
- Protocol from ISI, and the DVC control protocol (see next section).
-
-
- 6. Software Encoding
-
- Listed as a bonus topic on the agenda was a discussion of algorithms
- and protocols for software encoding of real-time media. This is not a
- main topic because such protocols should be at a layer above the
- transport. However, in keeping with the working group goal to foster
- interoperation and experimentation with packet audio and video, it may
- be valuable to agree on some (perhaps low performance) software
- compression techniques for use until hardware is generally available.
-
- For this purpose, Paul Milazzo from BBN gave an update on the protocol
- used in the Desktop Video Conference program. DVC uses the low-cost
- VideoPix frame-grabber card for SPARCstations plus software
- compression to generate video at about 5 frames per second. The DVC
- protocol communicates sequences of video subimage blocks over UDP and
- uses TCP for the control connection. A recent enhancement is the
- ability to decode multiple streams (up to 6 so far).
-
-
- 7. Further discussion
-
- Thanks to Karen Sollins and Eve Schooler for taking the notes from
- which these minutes were prepared. A longer report of the meeting
- with more detail will be posted to the mailing list rem-conf@es.net to
- stimulate discussion of the issues raised above. It is proposed that
- we also hold some packet audio teleconference meetings as needed to
- augment the e-mail discussion.
-